Advanced SQL Injection TechniquesAdvanced SQL Injection Techniques
Mike Shema
SQL Injection attacks target the core of a web
application: its database. Their most significant impact enables an
attacker to retrieve, modify, or delete arbitrary data. It is a serious
threat to any application with a database back-end and a threat that
should be fully understood in order to develop adequate countermeasures.
Every web server administrator must acknowledge
techniques that can be used to identify an SQL Injection vulnerability
(see Tobias Glemser's Article SQL Injection Attacks with PHP and MySQL,
hakin9 03/2005) and assess the scope of its risk. The basic methodology
for an SQL Injection attack is to identify a potential vector, then
exploit that vector with customized SQL queries - all through the web
browser.
Identification of the potential for a vulnerability is
important, but even more important is the ability to evaluate its
impact. In some cases, a SQL Injection vector may offer nothing more
than the capability to generate some syntax errors, such as trying to
convert strings to numeric values. In other cases, the vector may
enable the attacker to fully compromise a database's information.
Although the examples refer to MySQL databases, the techniques apply to
any database platform and, in most cases, can be applied without
modification. The core of these techniques targets the SQL language.
Certain database extensions merely make these techniques easier to
accomplish.
To refresh the memory
SQL Injection tests can be classified into three categories based on which aspect of the query is targeted:
-
attack the syntax of the query -
insert common SQL characters with the intention of generating errors to
identify potential attack vectors,
-
attack
the syntax of the language - target the SQL language itself in order to
generate database errors or perform simple queries by manipulating
language constructs and semantic identities,
-
attack
the logic of the query - rewrite the query to retrieve arbitrary data
from tables to which developers did not intend access.
These techniques can be combined to assess a web
application and determine its vulnerability to SQL Injection attacks.
In the next sections the SQL Injection payloads are presented without
the entire URL as an example. This makes it easier to understand the
techniques without cumbersome parameters and text.
This is also because the injection of these payloads is
quite simple. Given a URL of the form
http://site/page.cgi?a=foo&b=bar, a SQL Injection attack replaces
the vulnerable parameter's value with its payload:
http://site/page.cgi?a=&b=bar. As a
further reminder, one has to remember to encode spaces and other
characters in the payload so that they do not disrupt the syntax of the
URL.
Attack the syntax of the query
The single quote, while arguably the most popular
character for identifying SQL Injection vectors, is by no means the
only character necessary to generate a database error. This technique
encompasses most fundamental tests for potential vulnerabilities by
using SQL language metacharacters or formatting characters to disrupt
the syntax of the original query. For example, the following statements
cannot be parsed into valid queries because they have an ill-formed
syntax due to an unterminated single quote:
-
SELECT foo FROM bar WHERE a = ''';,
-
SELECT foo FROM bar WHERE a = '/*;,
-
SELECT foo FROM bar WHERE a = ';--;,
-
SELECT foo FROM bar WHERE a = '#;.
While the most common example is the single quote
character (ASCII 0x27), many characters can be used to disrupt the
syntax including:
Validation filters that only prohibit single quote
characters (or some small set of characters) might prevent full
exploitation of a vulnerability, but such filters are often inadequate.
They may simply obscure more fundamental problems with the
application's database connection architecture.
Quotes vs. slashes
PHP developers face several challenges and potentially
confusing recommendations when creating strong input validation
filters. PHP's magic_quotes() function automatically escapes all single
quotes with a backslash character; however, if this feature is combined
with a call to the strip_slashes() function, then the escape characters
have been removed:
-
SELECT foo FROM bar WHERE a = '\''; - single quote escaped,
-
SELECT foo FROM bar WHERE a = '''; - backslash stripped, query ill-formed.
The other danger of focusing on the single quote
character is that developers may not be aware of the full range of
characters and techniques available to an attacker for exploiting a SQL
query. The attacker can combine SQL functions to generate errors in the
syntax of a query.
You can also use inherent SQL functions to generate errors. The SQL CHAR()
function prints the ASCII equivalent of the argument. An attacker may
be able to inject quote characters by using odd or even amounts of CHAR(0x27)
strings (hexadecimal 0x27 represents the ASCII code for the single
quote). This is important, because the attack consists of alphanumeric
characters plus the parentheses. Consequently, monitoring input for quote characters will not catch or block the attack.
Variables may vary
Database-related errors can also be generated by
attacking variable types. This is most effective against numeric
values, but is also successful against date or time variables. For
example, here is a list of different values that you may try against
parameters that expect decimal numbers:
-
8-, 16-, 32- and 64-bit values - 256, 65536, etc.,
-
integer overflows - 2^8 + 1, 2^16 +1, 2^32 + 1, or 2^64 + 1,
-
unsigned vs. signed values - inserting negative values,
-
floating-point overflows - for example 3.40282346638528860e+38, 1.79769313486231570e+308,
-
alternate presentation - binary, octal, hexadecimal, or scientific notation.
These numeric attacks often succeed in generating errors
because the variables used to track these values are not strongly
typed. In PHP the parameter type of all $_REQUEST variables is a string. This means that, although you can perform arithmetic operations on variables ($a = 1; $a++),
the actual type of the variable may be considered a numeric string. The
variable may even be silently promoted from a number to a numeric
string when the value would normally result in an overflow, inf (infinity), or NaN (not a number) equivalent. For example, PHP's is_numeric("1e308") function returns true (it is a number), but is_numeric("1e309")
returns false - neither a number or numeric string because it is beyond
the double float type that PHP supports. A variable must be set to
numeric explicitly using the settype() function, but beware that large values may return a value of inf - which can also lead to errors in the query if it is expecting numerals.
Fighting the synonyms
Robust input validation filters can be an effective
countermeasures to these techniques, but they are not sufficient.
Database errors and other exceptions should be trapped and prevented
from being sent to the browser. Verbose error information tends to
provide useful information for malicious users targeting a database. As
we will see a bit later, input validation filters may be inadequate.
For example, we have already seen that the value 1e309 is not a number
(for most languages and SQL databases) and will generate an error in
less secure applications. Yet 1e309 does not contain any characters
that are normally malicious. It is a purely alphanumeric value.
Note that SQL is a rich language that provides an attacker to create many synonomous permutations. For example, CHAR(0x27) is equivalent to ASCII(0x27) which can also be written as x'27. We focus on using the CHAR(0x27)
string to avoid raw quotes in the payload, but the specifics of each
test are highly mutable. This also implies that syntax-based filtering
- such as application-layer firewalls - must be very robust in order to
prevent these attacks. In fact, the combination of alternate encoding
schemes (URL encoding, Unicode) and creative SQL will bypass most
pattern-matching filters. Remember, CHAR(0x27) is the same as cH%41r(0x68-0x41).
Semantic doppelgangers - attack the syntax of the language
In SQL, Shakespeare's observation of roses might look like the decidedly unpoetic:
SELECT name FROM roses
WHERE scent='sweet';
Whether a rose might be called shoe, bumblebee, or clock,
its sweet-smelling attribute remains unchanged. SQL provides a rich set
of functions that can be used to create semanticly equivalent queries
that look quite different textually. This capability enables an
attacker to identify and exploit injection vulnerabilities even when
the server does not reveal error information or similar output.
While it is useful to break queries in order to find
potential vulnerabilities, it is also profitable to attack the query
using the semantics of built-in SQL functions. Thus, instead of
attacking the parser of the application language (PHP, JSP, etc.), the
attack focuses on the SQL language itself. This has the added benefit
of not only identifying attack vectors, but also provides more
information about the input validation filters used by the application.
Another byproduct of this technique is the ability to perform blind SQL Injection attacks, or attacks that do not rely on error generation in order to identify or exploit.
Numeric data types
Numeric data types are the easiest candidates to test
with this technique. Figure 1 shows the original example URL, while
Figures 2 and 3 present modified addresses We are using an older,
insecure version of FreznoShop online shopping system - releases newer
than 1.4 branch are quite invulnerable.
Consider the following list of name/value pairs:
-
rowid = 111,
-
rowid = 0x6f,
-
rowid = 0157 (octal representation),
-
rowid = 110+1 (use 110%2b1 in practice because the + stands for a space character in the URL),
-
rowid = 112-1,
-
rowid = MOD(111,112),
-
rowid = REPEAT(1,3),
-
rowid = COALESCE(NULL,NULL,111).
Figure 1. The original example URL
Figure 2. Modified URL string
Figure 3. The same string modified with usage of MOD() function
From a database's point of view, each one of these
requests results in the same value: 111. Also notice that none of these
rely on the single quote character. The first three look like numeric
or alphanumeric strings, the next two have apparently innocuous
characters for the addition and subtraction symbols, and the final
three include parentheses and a comma. If input validation were to
focus on stripping the single quote, then a vulnerable application
would gain no benefit from such a countermeasure.
Raw parameters
This technique, which uses semantic doppelgangers,
enables the user to identify SQL Injection vectors. If the result of
each request is identical, then it can be assumed that the application
engine has parsed the raw parameter value and inserted it into the
underlying SQL query. For example, consider this query for a rowid:
SELECT foo FROM table
WHERE rowid = 110+1;
The database calculates 110+1 = 111 before resolving the
rest of the query, according to its order of operations. This bears the
same result as the original query:
SELECT foo FROM table
WHERE rowid = 111;
Before we explain how to extend this attack to extract
arbitrary data, let us first examine some other cases that can be used
for error generation. Even though this technique does not require us to
generate database errors, such information is useful to determine
versions and names of tables or columns. If the application's input
validation filters have stripped quote characters, but not trapped
database errors, then we can target incorrect SQL function syntax. For
example:
Of course, numeric values should also be tested for boundary conditions as mentioned in the previous section.
Premature termination characters
This technique lends itself to the creation of custom SQL
queries. Such queries often do not require quote characters, but often
require premature termination characters. Thus, a request might employ /* or -- in order to truncate additional, undesired statements. A string SELECT foo FROM table WHERE rowid = MOD(111,112)+UNION+SELECT+USER()/*; is a good example.
String values present a greater challenge because there
are fewer functions in the SQL language that provide helpful semantic
doppelgangers. The CONCAT() function is useful for these cases. In cases where the string argument only contains the letters a-f , the HEX() function can be used:
-
op=add,
-
op=HEX(2781),
-
op=REVERSE(dda),
-
LEAST(0x6d75736963,0x6e75736963),
-
GREATEST(0x61,0x6d75736963).
Once again, we have consciously chosen to avoid using
quote characters because they set off alarms or may be blocked. Yet
this doesn't prevent us from creating complex strings. The REVERSE(), LEAST(), and GREATEST() functions only need parentheses and commas. The following examples are all semantically identical:
-
page.cgi?category=music,
-
page.cgi?category=REVERSE(cisum),
-
page.cgi?category=GREATEST(0x61,0x6d75736963),
-
page.cgi?category=LEAST(0x6d75736963,0x6e75736963).
Countermeasures
The best countermeasures for these attacks use input
validation filters and strong data types when assigning user-supplied
values to query parameters. Even though 0x27 is a valid hexadecimal
value, it should be prohibited by the application because the raw value
contains a non-numeric character (or possibly silently coaxed into 27
decimal). Likewise, octal 0157 should either be denied because of the
leading zero, or the leading zero could be stripped so the value
becomes 157 decimal, which is merely a different row number. At the
very least, developers should be aware of alternate bases and
understand where they are interpreted: either in the application
language or in the database.
It's very easy to handle all user-supplied data as
strings, but if the data are to be inserted into a query, then they
should be explicitly assigned (cast) to the appropriate data type. For
interpreted languages such as PHP, Perl, C#, or Visual Basic the
assignment should be safe or generate a conversion error. If the web
application uses a compiled language such as C or C++, then the type
casting should be handled carefully and checked for exceptions (think
of format-string attacks).
Attack the logic of the query
Breaking the syntax of a query is useful for identifying
SQL Injection vulnerabilities, but it only demonstrates the existence
of a problem. Arbitrary data access is the true risk associated with
SQL Injection attacks.
MySQL supports a specific comment macro that triggers on
the database version /*! */, where is a
5-digit value that represents the MySQL build. For example, version
3.23.02 looks like 32302, version 4.1.10 looks like 40110, and version
5.0.3 looks like 50003. The most immediate way to test for embedded SQL
attacks with MySQL is to combine the comment extension with a statement
that ensures the query will fail:
Then, one can flip the query and ensure that it succeeds
in order to verify the injection vector - /*!32302+AND+1+*//* (it may
be necessary to terminate the query).
UNION SELECT
Once a parameter has been identified as a vector for SQL
Injection attacks, the next step is to determine the amount to which
the database is exposed. This is accomplished by manipulating the logic
of the original query. Most basic queries are of the form SELECT foo FROM bar WHERE a=b; in which the b of a=b clause is the parameter that can be manipulated. Consequently, the new query must consider the previous SELECT. The quickest technique is to use the UNION keyword.
The UNION statement combines multiple SELECT statements and is supported by most databases. The basic form looks like SELECT foo FROM bar WHERE a=b UNION SELECT foo2 FROM bar2 WHERE c=d;.
One useful UNION clause is to display the user name under which the database connection has been established. On MySQL you would do this with SELECT USER(). Inside a UNION clause the request might look like
SELECT text FROM articles
WHERE id=0
UNION SELECT USER();
Several challenges present themselves when using UNION statements for SQL Injection attacks:
-
the UNION clause should terminate the query to ensure valid syntax - any additional logic must be truncated,
-
UNION statements require matching column counts in each SELECT clause.
The first challenge is relatively easy to accomplish.
Simply use one of the common terminators described in the previous
section. This can be a comment delimiter (#, /*, --) in combination - if necessary - with a semicolon or single quote.
Columns and bears
The second challenge is not difficult to overcome, but
requires a few iterative steps remniscent of Goldilocks and the three
bears. The injected UNION
clause will either have too few columns or too many of them - what you
need is a number that is just right! If you can observe the database's
error messages, then you'll see something like The used SELECT statements have a different number of columns.
Column undercounts can be fixed by adding extra columns or column place-holders to the SELECT statement (see Figure 4). For example, consider the following statements:
-
SELECT user FROM mysql.user,
-
SELECT 1,user FROM mysql.user,
-
SELECT 1,1,user FROM mysql.user,
-
SELECT user,user,user,user FROM mysql.user.
Figure 4. A successful UNION SELECT attack
Each one of these queries is designed to grab the user name (or names) from the default mysql.user
table. The number of columns increases from one to four in each
example. In practice, it is better to repeat the column name to ensure
that the value is displayed in the application. The first placeholder
works, but it's hard to tell which column the web application will
display.
Column overcounts can be addressed by using the CONCAT
statement. Overcounts occur when the first SELECT statement expects
fewer columns than your custom query. The CONCAT statement resolves
this by concatenating each column into a single string. Thus, multiple
columns are reduced to a single column. For example:
SELECT foo FROM table
WHERE a=b
UNION SELECT CONCAT(*)
FROM mysql.user;
This can be combined with the undercount technique when necessary:
SELECT foo,bar FROM table
WHERE a=b
UNION SELECT 1,CONCAT(*)
FROM mysql.user;
The major caveat is that any NULL value in one of the column results will cast the CONCAT string to NULL.
Aim at rows
Once you have matched column counts for the query, the
next step is often to specify an arbitrary row to retrieve from a
table. When the query returns multiple rows, often only the first one
is displayed. To some degree, a good WHERE
clause can help target specific rows, but only if the table's general
structure (column names) is known before-hand. A much easier method
uses offsets within the LIMIT clause. You can limit the result to one row by using LIMIT 1, but you can control which row is returned by adding the optional offset beginning with 0. For example:
-
SELECT foo FROM table WHERE a=b UNION (SELECT CONCAT(*) FROM mysql.user LIMIT 0,1);,
-
SELECT foo FROM table WHERE a=b UNION (SELECT CONCAT(*) FROM mysql.user LIMIT 1,1);,
-
SELECT foo FROM table WHERE a=b UNION (SELECT CONCAT(*) FROM mysql.user LIMIT 2,1);.
You can progress through the offsets until the query returns a NULL row. Unlike the previous examples of simple queries, it is necessary to place parentheses around the clause that contains the LIMIT statement. Otherwise it will be incorrectly applied to the entire query.
Defence by statements
The use of prepared statements (also known as
parameterized queries) or stored procedures are effective
countermeasures to these techniques because they separate the logic of
the query from the data of the query. Consequently, injection attacks
can corrupt the original SQL query, but will not be able to rewrite it
in such a manner that arbitrary tables or data can be accessed.
A potential drawback of prepared statements is that they
require additional set-up within the application. This could lead to a
performance degradation; however, such an impact may be minimal. The
security gains are definitely good.
Help yourself and separate
Inadequate input validation filters are an integral part
of SQL Injection countermeasures, but they are often not the underlying
problem. Strong data typing (assigning numbers to numeric data types,
etc.) is also key, but string data always presents a challenge (see
Frame Additional SQL tricks).
Additional SQL tricks
Our core idea is to identify a SQL Injection
vulnerability via creative use of SQL formatting characters (syntax) or
SQL functions (semantics), then exploit the vulnerability by attacking
the SQL logic. Although it primarily focuses on numeric and string
manipulation, other functions can be used (or rather misused) to
generate errors for vulnerability identification:
-
INET_ATON(),
-
INET_NTOA(),
-
SOUNDEX().
Enumeration is another important part of SQL Injection;
one that is beyond our scope here. Nevertheless, here are some simple
queries that can be used to further determine information about a
database:
-
SHOW VARIABLES,
-
SHOW STATUS,
-
SHOW DATABASES,
-
SHOW TABLES,
-
DESCRIBE
,
-
SELECT USER(),
-
SELECT SESSION_USER(),
-
SELECT CURRENT_USER(),
-
SELECT SYSTEM_USER(),
-
SELECT SUBSTRING_INDEX(USER(),'@',1),
-
SHOW CHARACTER SET,
-
SELECT CURDATE(),
-
SELECT CURTIME().
A more fundamental problem of SQL Injection is the lack
of separation between the query's logic and data. The logic is defined
by the developer and is expected to remain static. The data are
collected from the user. When the data and logic intermingle, such as
using string concatenation to build queries, then user-supplied data
can manipulate the logic of the query. This is the higher risk compared
to input validation, because a modified query provides access to
arbitrary data in the database. A formatting character maliciously
inserted into a stored procedure may merely produce a database error
instead of exposing the actual data. This is not meant to imply that
input validation is not important; however, any countermeasure to these
types of attacks should focus equally on query construction and
execution.
Without a comprehensive understanding of the different
techniques that attackers employ against web applications, developers
will not create effective countermeasures. From an assessment
perspective, auditors who do not adequately investigate the scope of a
SQL Injection vulnerability present an inaccurate view of the
application's risk - and if testing only relies on injecting single
quote characters, then the assessment may be useless. SQL Injection
attacks can be executed with many different characters.